Goto

Collaborating Authors

 flexible expression


Described Object Detection: Liberating Object Detection with Flexible Expressions

Neural Information Processing Systems

Detecting objects based on language information is a popular task that includes Open-Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this paper, we advance them to a more practical setting called *Described Object Detection* (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object. We establish the research foundation for DOD by constructing a *Description Detection Dataset* ($D^3$). This dataset features flexible language expressions, whether short category names or long descriptions, and annotating all described objects on all images without omission. By evaluating previous SOTA methods on $D^3$, we find some troublemakers that fail current REC, OVD, and bi-functional methods. REC methods struggle with confidence scores, rejecting negative instances, and multi-target scenarios, while OVD methods face constraints with long and complex descriptions. Recent bi-functional methods also do not work well on DOD due to their separated training procedures and inference strategies for REC and OVD tasks. Building upon the aforementioned findings, we propose a baseline that largely improves REC methods by reconstructing the training data and introducing a binary classification sub-task, outperforming existing methods. Data and code are available at https://github.com/shikras/d-cube


Described Object Detection: Liberating Object Detection with Flexible Expressions

Neural Information Processing Systems

Detecting objects based on language information is a popular task that includes Open-Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this paper, we advance them to a more practical setting called *Described Object Detection* (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object. We establish the research foundation for DOD by constructing a *Description Detection Dataset* ( D 3). This dataset features flexible language expressions, whether short category names or long descriptions, and annotating all described objects on all images without omission. By evaluating previous SOTA methods on D 3, we find some troublemakers that fail current REC, OVD, and bi-functional methods.


Flexible expressions could lift 3D-generated faces out of the uncanny valley – TechCrunch

#artificialintelligence

Disney Research is working on ways to smooth out this process, among them a machine learning tool that makes it much easier to generate and manipulate 3D faces without dipping into the uncanny valley. Of course this technology has come a long way from the wooden expressions and limited details of earlier days. High-resolution, convincing 3D faces can be animated quickly and well, but the subtleties of human expression are not just limitless in variety, they're very easy to get wrong. Think of how someone's entire face changes when they smile -- it's different for everyone, but there are enough similarities that we fancy we can tell when someone is "really" smiling or just faking it. How can you achieve that level of detail in an artificial face?